首页> 外文OA文献 >Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures
【2h】

Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures

机译:使用智能算法和数据结构搜索蛋白质3-D结构以获得最佳结构比对

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we present a novel algorithm for measuring protein similarity based on their 3-D structure (protein tertiary structure). The algorithm used a suffix tree for discovering common parts of main chains of all proteins appearing in the current research collaboratory for structural bioinformatics protein data bank (PDB). By identifying these common parts, we build a vector model and use some classical information retrieval (IR) algorithms based on the vector model to measure the similarity between proteins - all to all protein similarity. For the calculation of protein similarity, we use term frequency inverse document frequency (tf × idf) term weighing schema and cosine similarity measure. The goal of this paper is to introduce new protein similarity metric based on suffix trees and IR methods. Whole current PDB database was used to demonstrate very good time complexity of the algorithm as well as high precision. We have chosen the structural classification of proteins (SCOP) database for verification of the precision of our algorithm because it is maintained primarily by humans. The next success of this paper would be the ability to determine SCOP categories of proteins not included in the latest version of the SCOP database (v. 1.75) with nearly 100% precision.
机译:在本文中,我们提出了一种基于蛋白质3-D结构(蛋白质三级结构)的蛋白质相似性测量新算法。该算法使用后缀树来发现结构生物信息学蛋白质数据库(PDB)的当前研究合作伙伴中出现的所有蛋白质主链的通用部分。通过识别这些共同的部分,我们建立了一个载体模型,并基于该载体模型使用一些经典的信息检索(IR)算法来测量蛋白质之间的相似性-所有到所有蛋白质相似性。为了计算蛋白质相似性,我们使用术语频率逆文档频率(tf×idf)术语权重方案和余弦相似性度量。本文的目的是介绍基于后缀树和IR方法的新蛋白质相似性度量。使用整个当前的PDB数据库来证明该算法的时间复杂度非常高以及精度很高。我们选择了蛋白质的结构分类(SCOP)数据库来验证我们算法的精度,因为它主要由人类维护。本文的下一个成功将是能够以近100%的精度确定最新版本的SCOP数据库(v。1.75)中未包括的SCOP蛋白质类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号